skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Queen, Owen"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Organisms that live in different environments face different evolutionary pressures. As such, organisms that have more successful phenotypes reproduce more frequently, but differing selective pressures acting at the organismal level can influence genes, and thus proteins. Understanding how proteins adapt across environments may therefore be useful in engineering proteins for specific environments as well as to improve our understanding of basic biology. In this work, we explicitly compare homologous (read: paired) proteins from different environments. While previous studies have explored the relevant evolutionary pressures in one of these environments [11], [17] and genomic responses to those pressures [1], [28], no prior computational study of their proteins has been performed. We apply ESM-2 [20] and although there is no signal in our negative control (two divergent yeast strains) as expected, we obtain near perfect prediction accuracy for our selected environmental gradient–the well-established subsurface vs. surface biome. We further show that ESM-2 is able to capture relevant fine-grained biological patterns in its embedding space, even in its smallest model. Significantly, we demonstrate that these embeddings can be interpreted using a novel visualization pipeline built using explainable AI techniques. 
    more » « less
  2. Characterizing human proteins remains a major challenge: approximately 29% of human proteins lack experimentally validated functions and even well-annotated proteins often lack context-specific phenotypic insights. To enable universal modeling of protein phenotypes, we present ProCyon, a multimodal foundation model that utilizes protein sequence, structure, and natural language for generating and predicting protein phenotypes across diverse knowledge domains. ProCyonis trained on our novel dataset, ProCyon-Instruct, with 33 million protein phenotype instructions. On dozens of benchmarking tasks, ProCyonperforms competitively against single-modal and multimodal models. Further, ProCyonconditionally retrieves proteins via mechanisms of action of small molecule drugs and disease contexts, and it generates candidate phenotypic descriptions for poorly characterized proteins, including those implicated in Parkinson’s disease that were identified after ProCyon’s knowledge cutoff date. We experimentally confirm ProCyon’s predictions in multiple sclerosis using post-mortem brain RNA-seq, identifying novel MS genes and elucidating associated pathway mechanisms consistent with cortical pathology. ProCyonpaves the way toward a general approach to generate functional insights into the human proteome. 
    more » « less